On U-processes and clustering performance
نویسنده
چکیده
Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U -processes, for studying the performance of such clustering methods. In this setup, under adequate assumptions on the complexity of the subsets forming the partition candidates, the excess of clustering risk is proved to be of the order OP(1/ √ n). Based on recent results related to the tail behavior of degenerate U -processes, it is also shown how to establish tighter rate bounds. Model selection issues, related to the number of clusters forming the data partition in particular, are also considered.
منابع مشابه
A statistical view of clustering performance through the theory of U-processes
Many clustering techniques aim at optimizing empirical criteria that are of the form of a U -statistic of degree two. Given a measure of dissimilarity between pairs of observations, the goal is to minimize the within cluster point scatter over a class of partitions of the feature space. It is the purpose of this paper to define a general statistical framework, relying on the theory of U process...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملA FUZZY MODEL FOR ASSESSMENT PROCESSES
The methods of assessing the individuals’ performance usually applied in practice are based on principles of the bivalent logic (yes-no). However, fuzzy logic, due to its nature of including multiple values, offers a wider and richer field of resources for this purpose. In this paper we use principles of fuzzy logic in developing a new method for assessing the performance of groups of individua...
متن کاملA Novel Clustering Approach for Estimating the Time of Step Changes in Shewhart Control Charts
Although control charts are very common to monitoring process changes, they usually do not indicate the real time of the changes. Identifying the real time of the process changes is known as change-point estimation problem. There are a number of change point models in the literature however most of the existing approaches are dedicated to normal processes. In this paper we propose a novel app...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کامل